Emin Erkan Korkmaz and Gg Okt Urk Uu Coluk (1997) a Method for Improving Automatic Word Categorization. a Method for Improving Automatic Word Categorization

نویسنده

  • Emin Erkan Korkmaz
چکیده

This paper presents a new approach to automatic word categorization which improves both the eeciency of the algorithm and the quality of the formed clusters. The unigram and the bigram statistics of a corpus of about two million words are used with an eecient distance function to measure the similarities of words, and a greedy algorithm to put the words into clusters. The notions of fuzzy clustering like cluster prototypes, degree of membership are used to form up the clusters. The algorithm is of unsupervised type and the number of clusters are determined at run-time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Method for Improving Automatic Word Categorization

A METHOD FOR IMPROVING AUTOMATIC WORD CATEGORIZATION Korkmaz, Emin Erkan MS., Department of Computer Engineering Supervisor: Ass. Prof. Dr. G okt urk  U coluk September 1997, 57 pages In this thesis study a new approach to automatic word categorization which improves both the e ciency of the algorithm and the quality of the formed clusters is presented. The unigram and the bigram statistics ...

متن کامل

Choosing a Distance Metric for Automatic Word Categorization

WORD CATEGORIZATION Emin Erkan Korkmaz G okt urk  U coluk Department of Computer Engineering Middle East Technical University Ankara-Turkey Emails: [email protected] [email protected] Abstract This paper analyzes the functionality of different distance metrics that can be used in a bottom-up unsupervised algorithm for automatic word categorization. The proposed method uses a mod...

متن کامل

Method for Improving Automatic Word Categorization

This paper presents a new approach to automatic word categorization which improves both the efficiency of the algorithm and the quality of the formed clusters. The unigram and the bigram statistics of a corpus of about two million words are used with an efficient distance function to measure the similarities of words, and a greedy algorithm to put the words into clusters. The notions of fuzzy c...

متن کامل

Controlled Genetic Programming Search for Solving Deceptive Problems a Thesis Submitted to the Graduate School of Natural and Applied Sciences

CONTROLLED GENETIC PROGRAMMING SEARCH FOR SOLVING DECEPTIVE PROBLEMS Korkmaz, Emin Erkan Ph.D., Department of Computer Engineering Supervisor: Asso . Prof. Dr. G okt urk  U oluk Mar h 2003, 77 pages Traditional Geneti Programming randomly ombines subtrees by applying rossover. There is a growing interest in methods that an ontrol su h re ombination operations. In this thesis, a new approa h ...

متن کامل

Improving automatic image annotation: Approach by Bag-Of- Key Point

Automatic image annotation is to associate each image a set of keywords and describing the visual content of the image using an automatic system without any human intervention, many approaches have been proposed for the realization of such a system However, it is still inefficient in terms of semantic description of the image. Recent works show a frequent use of a special technique known as bag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997